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ABSTRACT 

We present a framework for designing efficient distributed 
data structures for multi-dimensional data. Our structures, 
which we call skip-webs, extend and improve previous ran- 
domized distributed data structures, including skipnets and 
skip graphs. Our framework applies to a general class of data 
querying scenarios, which include linear (one-dimensional) 
data, such as sorted sets, as well as multi-dimensional data, 
such as d-dimensional octrees and digital tries of character 
strings defined over a fixed alphabet. We show how to per- 
form a query over such a set of n items spread among n 
hosts using ©(log n/ log log n) messages for one-dimensional 
data, or O(logn) messages for fixed-dimensional data, while 
using only O(logn) space per host. We also show how to 
make such structures dynamic so as to allow for insertions 
and deletions in 0(log n) messages for quadtrees, octrees, 
and digital tries, and 0(log n/ log log n) messages for one- 
dimensional data. Finally, we show how to apply a blocking 
strategy to skip- webs to further improve message complexity 
for one- dimensional data when hosts can store more data. 

Keywords: Distributed data structures, peer-to-peer net- 
works, skip lists, quadtrees, octrees, tries, trapezoidal maps. 

1. INTRODUCTION 

Peer-to-peer networks offer a decentralized, distributed 
way of storing large data sets. They store data at the hosts 
in a network and they allow searches to be implemented 
by sending messages between hosts, so as to route a query 
to the host that stores the requested information. That is, 
data is stored at the nodes of a network according to some 
indexing strategy organized over a set of data attributes, 
with the desire that users should be able to quickly access 
this data using attribute-based queries. In this paper, we 
are interested in allowing for a fairly rich set of possible 
data queries, including an exact match for a key (e.g., a file 



Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
republish, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. 

PODC'05. July 17-20, Las Vegas, Nevada, USA. 
Copyright 2005 ACM 1-59593-994-2/05/0007 ...$5.00. 



name), a prefix match for a key string, a nearest-neighbor 
match for a numerical attribute, a range query over various 
numerical attributes, or a point-location query in a geomet- 
ric map of attributes. That is, we would like the peer-to-peer 
network to support a rich set of possible data types that al- 
low for multiple kinds of queries, including set membership, 
1-dim. nearest neighbor queries, range queries, string prefix 
queries, and point-location queries. 

The motivation for such queries include DNA databases, 
location-based services, and approximate searches for file 
names or data titles. For example, a prefix query for ISBN 
numbers in a book database could return all titles by a cer- 
tain publisher. Likewise, a nearest-neighbor query in a two- 
dimensional point set could reveal the closest open com- 
puter kiosk or empty parking space on a college campus. 
By allowing for such databases to be stored, searched, and 
updated in a distributed peer-to-peer (p2p) network, appli- 
cation providers can maximize information availability and 
accuracy at low cost to any individual host participant (as- 
suming honest participants, of course^). 

The key challenges in setting up such p2p networks is to 
balance the space and processing loads of the hosts while 
providing for low message costs for routing queries and per- 
forming insertion and deletion updates in the data set. We 
would like the data to be distributed as evenly as possible 
across the network, so that the space-usage load of each host 
is roughly the same. And we desire that the data manage- 
ment be fully decentralized, with hosts able to insert and 
delete data they own without employing a centralized data 
manager or repository (which would create a single point of 
failure). Likewise, we would like the indexing structure to be 
distributed in such a way that the query-processing load is 
also spread as uniformly as possible across the nodes of the 
network. This paper is therefore directed at the study of dis- 
tributed peer-to-peer data structures for multi-dimensional 
data. 

1.1 Cost Measures and Models for Peer-to- 
Peer Data Structures 

There are a number of cost measures for quantifying the 
quality of a distributed data structure for an p2p network. 
First and foremost, we have the following two parameters: 



^The security of distributed data structures is an interesting 
research topic Q |S1 I19| . but is beyond the scope of this 
paper. 
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Figure 1: A skip list data structure. Each element exists in the bottom-level list, and each node on one level 
is copied to the next higher level with probability 1/2. A search starts at the top and proceeds as far as it 
can on a given level, then drops down to the next level, and continues until it reaches the desired node on 
the bottom level. The expected query time is O(logn) and the expected space is 0{n). 



• H: the number of hosts. Each host is a computer on 
the network with a unique ID. We assume the network 
allows any host to send a message to any other host. 
We also assume that hosts do not fail. 

• M: the maximum memory size of a host. The memory 
size is measured by the number of data items, data 
structure nodes, pointers, and host IDs that any host 
can store. 

In addition to the parameters H and M, we have the fol- 
lowing cost functions for a distributed data structure storing 
a set 5 of n items in a peer-to-peer network: 

• Q(n): the query cost — the number of messages needed 
to process a query on S. 

• U(n): the update cost — the number of messages needed 
to insert a new item in the set S or remove an item 
from the set S. 

• C{n): the congestion per host — the sum of the number 
of references to items stored at the host, the number 
of references to items stored at other hosts, and the 
number n/H (which measures the expected number 
of queries likely to begin at any host, based on the 
number of items in the set S). 

We assume that each host has a reference to the place where 
any search from that host should begin, i.e., a root node for 
that host (which may be stored on the host itself). 

1.2 Previous Related Work 

We are not aware of any previous related work on storing 
multi-dimensional data in peer-to-peer networks. 

Nevertheless, there are existing efficient structures for stor- 
ing and retrieving one-dimensional data in peer-to-peer net- 
works. For example, there is a considerable amount of prior 
work on variants of Distributed Hash Tables (DHTs), in- 
cluding Chord QIIHl, Koorde [HI, Pastry Scribe [HI, 
Symphony [12J, and Tapestry '211, to name just a few. These 
structures do not allow for sophisticated queries, however, 



such as nearest-neighbor searching (even in one-dimension), 
string prefix searching, or multi-dimensional point location. 

For one- dimensional nearest-neighbor searching, Aspnes 
and Shah P] present a distributed data structure, called 
skip graphs, for searching ordered content in a peer-to-peer 
network, based on the elegant, randomized skip-list data 
structure |15| . (See Figure 0) Harvey et al. |10| inde- 
pendently present a similar structure, which they call Skip- 
Net. These structures achieve O(logn) congestion, expected 
query time, and expected update times, using n hosts. Har- 
vey and Munro present a deterministic version of Skip- 
Net, showing how updates can be performed efficiently, al- 
beit with higher congestion and update costs, which are 
O(log^n). Zatlukal and Harvey |2UI show how to modify 
SkipNet, to construct a structure they call family trees, to 
achieve O(logn) expected time for search and update, while 
restricting the number of pointers from one host to other 
hosts to be 0(1). Manku, Naor, and Wieder show how 
to improve the expected query cost for searching skip graphs 
and SkipNet to 0(log n/ log log n), by having hosts store the 
pointers from their neighbors to their neighbor's neighbors 
(i.e., neighbors-of-neighbors (NoN) tables); see also Naor 
and Wieder jfTTi . Unfortunately, this improvement requires 
that the memory size, congestion, and expected update time 
grows to be O(log^n). Focusing instead on fault tolerance, 
Awerbuch and Scheideler |1| show how to combine a skip 
graph/SkipNet data structure with a DHT to achieve im- 
proved fault tolerance^, but at an expense of a logarithmic 
factor slow-down for queries and updates. Aspnes et al. [5| 
show how to reduce the space complexity of the skip graphs 
structure, by bucketing intervals of keys on the "bottom 
level" of their structure. Their method improves the ex- 
pected bounds for searching and updating, but only by a 
constant factor whenever H is Q{n'^), for constant e > 0. 



^Achieving improved fault tolerance in peer-to-peer data 
structures for multi-dimensional data is a topic for possi- 
ble future research. 
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Table 1: Comparison of 1-dimensional skip- webs with previous methods for 1-dimensionaI nearest neighbor 
structures. We use 0{*) to denote an expected cost bound. The skip-webs and bucket skip-webs solutions 
are presented in this paper. 



1.3 Our Results 

In this paper, we present a framework for designing ran- 
domized distributed data structure that improves previous 
skip-graph/SkipNet approaches and extends their area of 
applicability to multi-dimensional data sets. The queries 
allowed include one-dimensional nearest neighbor queries, 
string searching over fixed alphabets, and multi-dimensional 
searching and point location. Our structure, which we call 
skip-webs, matches the 0(log n/ log log n) expected query 
time of NoN skip-graphs |13l I14| for one-dimensional data, 
while maintaining the O(logn) memory size and expected 
query cost of traditional skip graphs [2] and SkipNet |10| . 
We also introduce a bucketed version of our skip-web struc- 
ture, which improves the overall space bounds of our struc- 
ture, while also significantly improving the expected query 
and update times. Indeed, if the memory size M of our hosts 
is 0(n'^), for constant e > 0, then our methods result in 
expected constant-cost methods for performing queries and 
updates. In Table we highlight how our methods com- 
pare with previous solutions for one-dimensional nearest- 
neighbor searches. 

In addition to improving previous randomized distributed 
data structures for one-dimensional data, we also extend 
the areas of applicability for such structures, by introducing 
a general framework for constructing distributed structures 
for multi-dimensional data sets, including d-dimensional point 
sets, sets of character strings over fixed alphabets, and (trape- 
zoidal) maps of planar subdivisions defined by disjoint line 
segments in the plane (e.g., as would be created by a cam- 
pus or city map in a geographic information system). Our 
approach is based on viewing the nodes and links of an un- 
derlying data structures as having ranges (that is, sets) as- 
sociated with them. By then applying a random selection 
process to these range spaces we produce a distributed data 
structure (with low congestion) that is able to perform effi- 
cient queries on the underlying structure. For example, we 
can locate a point in a distributed two-dimensional quadtree 
of n points using only 0{log n) messages, even if the un- 
derlying quadtree has 0{n) depth, providing a distributed 
analogue to the skip quadtree data structure of Eppstein et 
al. IS]. 

2. THE SKIP- WEB FRAMEWORK 

The skip-web framework applies to a wide variety of data 
querying scenarios. In this section, we define the conditions 



that a data querying scenario must satisfy in order to allow 
for a skip-web implementation. Intuitively, our conditions 
apply to any data structure built deterministically from an 
input set S using nodes and links between those nodes. Our 
scheme builds a hierarchical, distributed structure in a way 
that allows for fast queries. 

2.1 Range-Determined Link Structures 

We consider the input set of n items to be a ground set S 
of items taken from some universe U. In order for the skip- 
web framework to apply, we require that S and U define 
a unique link structure That is, the universe U and 

the elements of S determine a unique data structure D{S) 
consisting of nodes and links connecting pairs of those nodes. 
Furthermore, we assume that each node and link in D{S) is 
associated with a range (that is, a set) of values from U and 
there is an incidence between a node and a link if and only 
if their respective ranges have a non-empty intersection. 

For example, if [/ is a one-dimensional total order and S 
is a finite subset of U, then D could be an ordered linked list 
on the elements from S. In this case, the range associated 
with each node would be a unique element from S and the 
range associated with a link joining two nodes storing x 
and y, respectively, would be the interval [x,y]. Such a 
linked list D{S) is therefore a unique link structure with 
the incidences of its nodes and links being defined by their 
associated ranges. In general, we call such a structure a 
range-determined link structure. 

As another example, consider D{S) as a digital trie for 
a set 5 of n character strings, defined for a fixed alphabet. 
The range associated with a node v in D{S) would natu- 
rally correspond to the singleton set containing the string 
that leads to v in D{S). Likewise, the range associated with 
an edge (v, w) in D{S) would naturally correspond to the 
set of all strings of the form xy, where x is the string that 
leads to v and j/ is a (possibly empty) prefix of the string 
associated with the edge {v,w). Thus, the intersection re- 
lationships between these sets of strings gives rise to the 
incidences between the nodes and links in D{S). 

2.2 Defining a Set-Halving Lemma 

Let 5* be a ground set of n items taken from some universe 
U, and let D{S) be the range-determined link structure for 
S. Further, let T be a subset of S and let D{T) be the 
range-determined link structure for S (defined by the same 
data-structure construction function, D). 
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Figure 2: A one-dimensional skip-web. Each node is copied to the next level 0-list or 1-list, respectively, with 
probability 1/2. The different levels of the skip-web are separated by dashed lines. Note that if we follow 
pointers down from any top-level node, the structure "looks" like a skip-list. The nodes shown in gray could 
belong to a single host in the skip-web. (A skip-graph instead stores "towers" of nodes with the same key at 
the same host.) 



We say that a range Q for a node or link in D{T) conflicts 
with a range R for a node or link in -0(5*) if Q n i? 7^ (it 
is possible that Q = R, which we still count as a conflict). 
For a given range Q for D{T), we let C{Q, S) denote the set 
of all ranges for D{S) that conflict with Q. We call this set 
the conflict list for Q. 

The range-determined link structures we consider in this 
paper all have an important property with respect to the 
way that ranges conflict when we randomly halve a given 
ground set. In particular, we utilize the following: 

• Template for a Set Halving Lemma: Given a set S 
of n items taken from U, let T be a subset of n/2 
items chosen uniformly at random from S. If there 
exists a constant c > such that we can show that 
E{\C{Q, S)\) < c, for any item q £ U and for the 
maximal range Q of D{T) containing q, then we say 
that U and D have a set halving lemma. 

We demonstrate set halving lemmas for a immber of range- 
determined link structures, including: 

1. compressed quadtrees and octrees for points in TV' 

2. compressed digital tries for character strings over fixed 
alphabets 

3. linked lists of sorted sets 

4. Trapezoidal diagrams of non-crossing sets of line seg- 
ments in the plane. 

For example, we have the following: 

Lemma 1. Let U be a total order, let S be a set ofn items 
from U , let q be an arbitrary item in U, and let T be a subset 
of n/2 items chosen uniformly at random from S. Further, 
let D be the range- determined linked structured defined by a 
doubly-linked list, and let Q be the maximal range in D{T) 
containing q. Then E{\C{Q, S)\) is 0{1). 



Proof. For any x in S, the probability that x belongs 
to Q is 2-lt'''"^)"'^!. Thus, E{\Q n 5"!) is a sum of inverse 
powers of two, in which each power appears at most twice; 
therefore, E{\Q n 5*1) < 4. \C{Q,S)\ < 2\Q n S\ ~ 1, so 
£;(|C(Q,5')|) < 2£:(|QnS|) - 1<= 7. □ 

This is the set halving lemma for a sorted linked list. 
Rather than immediately showing that the remaining range- 
determined link structures above also have set halving lem- 
mas, let us continue our discussion with the way a range- 
determined link structure with a set halving lemma can be 
used to construct an efficient distributed data structure. We 
use the sorted linked list as a running example, but the con- 
struction applies to other link structures as well. 

2.3 Skip- Web Levels 

Suppose, then, that we are given a set S of n elements, 
taken from a universe U, with a range-determined link struc- 
ture D with a set-halving lemma. We define So = 5* to be 
the level-0 subset of 5* and we let D{So) be the level-0 struc- 
ture for 5*0. For example, with a set So taken from a total 
order, D{So) is the sorted linked list defined on So. 

Suppose we have inductively defined a level-i set Sb, where 
b is an {i -\- l)-bit binary string. We generate a new random 
bit for each x in Sb. If the bit for x is 0, then we include a; in a 
set, Sbo', otherwise, we include x in a set, Sbi. For each range 
Q for D{Sbo) we store hyperlink pointers to the nodes and 
links for the ranges in C{Q, Sb), that is, the ranges in D(Sb) 
that conflict with Q. In this case, a pointer consists of a pair 
{h, a), where h is the ID of a host and a is an address on that 
host where the item being referred to is stored (we assume 
that links have an address). Likewise, for each range Q in 
D{Sbi) we store hyperlink pointers to the ranges in C{Q, Sb), 
i.e., in D{Sb), that conflict with Q. We repeat this process 
until we define sets with [log n] -bit indices. The expected 
size of each top-level structure is 0(1), if the total number 
of levels is O(logn). (See Figure|5|) 



2.4 Distributed Blocking 

Given a range-determined link structure and a ground set 
S with a set-halving lemma, the final ingredient for con- 
structing a skip web is to assign the nodes and links of the 
various levels to hosts in the network. For now, let us sim- 
ply assume that this assignment is arbitrary; so that each 
host on the network gets 0{M) nodes and links from among 
the 0(n log n) possible. So, for example, if AI = logn, 
then we can use n hosts to store the data structure using 
O(logn) space per node. This allocation, and the generality 
of the skip- web approach, leads to fast query times for sev- 
eral multi-dimensional distributed search problems, which 
we detail in Section |H] 

2.4.1 Improved Blocking for One-Dimensional Data 

For one-dimensional data, the above arbitrary blocking 
approach would match the performance of skip-graph 
and SkipNet 110' structures. We can improve this blocking 
strategy for one-dimensional data, however, by a more clever 
approach, thereby constructing a distributed data structure 
for one-dimensional data that is more efficient than skip- 
graphs and SkipNet and faster than family trees |20| . As 
with these other structures, the queries our structure sup- 
ports are one-dimensional nearest-neighbor queries, which is 
equivalent to a one-dimensional point-location query, that is, 
given a point x in U, find a range R for a node or link in 
D{S) that contains x. For any universe U of one-dimensional 
data and link structure D based on a doubly-linked list, we 
desire that the expected number of messages to answer a 
range- containment query is 0(log n/ log M), where M is the 
memory size of each host. 

Suppose, then, that we are given a set S of n elements 
taken from a one-dimensional universe U, and we use an 
ordered doubly-linked list D as our range-determined link 
structure. By Lemma Q this set and structure have a set- 
halving lemma. As in the general case, we define So = 5 to 
be the level-0 subset of S and we let D{So) be the level-0 
structure for 5*0, which, in this case, is a simple linked list 
on 5*0. 

Recall that we inductively defined a level-i sets St, where 
b is an (i 4- l)-bit binary string using random sampling. We 
store the data structure nodes and links of each D{Sb) in a 
hierarchical, stratified fashion. We define level i set as basic 
if i is a multiple of L = [log M~\ . If Sb is a basic set, then 
we divide D{Sb) into M/L blocks of contiguous ranges, that 
is, a contiguous portion of the nodes and links in the linked 
list D{Sb). 

Let TZ be the contiguous set of ranges of a basic D{Sb) 
stored on some host h. We also store at h all the ranges of 
D{Sbo) that confiict with ranges in TZ as well as the ranges of 
■p(Sbi) that conflict with ranges in TZ. In fact, we store at h 
the ranges in D{Sboo), D{Sboi), DP{Sbio), and D{Sbii) that 
confiict with ranges on the next lower level, and we continue 
this process for all the non-basic levels above i. (Note that 
copies of some of these ranges may be stored on multiple 
hosts, but these overlapping copies will only increase the 
total space by a constant factor, since these sets of ranges 
are contiguous and compact.) This construction requires 
space proportional to M, as required. The number of hosts 
needed is H < cnlogn/ M , for some constant c. Thus, H 
is n if we choose M to be O(logn). The in-degree and out- 
degree of each nodes is 0{M), as well. 



2.5 Answering Queries 

To answer a query q (which may be over multiple at- 
tributes) in a skip-web we begin with the "root" nodes and 
links of a level-fc D{Sb) stored on the host originating the 
query. We search in D{Sb) as far as we can for q, stopping at 
a node or link whose range R includes q or intersects q. We 
then follow the hyperlinks for R to the nodes of the level- 
(fc — 1) structure, possibly stored at another host. If the hy- 
perlinks are stored on our current host, then we follow these 
connections internally. Otherwise, following these hyper- 
links is done by sending a message for q to the host(s) stor- 
ing the hyperlinks, asking the host(s) to process the query 
as far as they can internally and then return either an an- 
swer or hyperlinks to other hosts at which we can continue 
the search. We continue following all the active hyperlinks 
in this way, possibly sending messages to other hosts, until 
we complete all the processing for the query q. 

The key observation for analyzing the message complex- 
ity for answering such a query is to note that we have to 
consider an expected constant number of conflicting ranges 
in going from one level to the next (by following hyper- 
links leading from this level to ultimately yield answers) will 
be processed, e.g., just one hyperlink in a one-dimensional 
nearest-neighbor search. In the case of general skip-web 
structures, this implies a query time of 0(log n) message per 
answer. In the case of bucketed one-dimensional data, we 
note that we need follow an expected constant number of ex- 
ternal hyperlinks (to new hosts to determine which pointer 
to follow in our search) for basic levels. The pointers we 
need to follow for non-basic levels need not branch out to 
other hosts, by construction. Thus, for example, if M is 
O(logn), then we can answer nearest-neighbor queries using 
0(log n/ log log n) expected message complexity. We sum- 
marize: 

Theorem 2. Given a set S of n elements taken from 
a universe U, and having a range- determined link struc- 
ture with a set-halvmg lemma, then we can construct a dis- 
tributed skip-web structure that uses n hosts, with mem- 
ory size O(logn), congestion O(logn), and query complex- 
ity 0(log n), which can be improved to 0(log n/ log log n) for 
one- dimensional data. 

For general values of the host memory size, M, when we 
are storing one-dimensional data, we get a structure we call 
the bucket skip-web, whose performance is as mentioned in 
Table El 

3. MULTI-DIMENSIONAL SKIP WEBS 

Having discussed the general skip- web framework and how 
to optimize it for one-dimensional data, we discuss in this 
section how to construct distributed skip webs for specific 
multi-dimensional data sets. 

3.1 d-Dimensional Point Sets: Quadtrees and 
Octrees 

In this section, we outline a construction of a skip- web for 
compressed quadtrees and octrees in d-dimensional space, 
for any fixed constant d > 2. A quadtree (for two-dimensional 
data) or octree (for higher dimensions) is defined in terms 
of a set 5* of n points and a bounding hypercube (a square 
in two-dimensions). The root r of the tree is associated 
with the bounding cube. This cube is subdivided into 2^* 
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Figure 3: Illustrating the halving lemma for quadtrees, (a) the regions for a compressed quadtree defined on 
a set S of points; (b) the compressed quadtree for the points in S; (c) an example random subset T of the 
points in S and their associated quadtree regions; (d) the compressed quadtree for the points in T. 



subcubes with side-length half of that of the bounding cube 
and each non-empty subcube is a child of r and is the root 
of a tree that is recursively subdivided in the same manner. 
This tree is then converted into a compressed version by 
compressing chains of nodes with only one child into edges. 
This tree has 0(n) nodes and links, but can have depth 
0(n) in the worst case. More importantly, for the context 
of this paper, a compressed quadtree or octree is a range- 
determined link structure. 

By showing that quadtrees and octrees have a set-halving 
lemma, we show that we can build a skip-web structure for 
multi-dimensional point sets that can perform point-location 
in the subdivision of space defined by the hypercubes as- 
sociated with nodes in the tree. This point location can 
be done with O(logn) messages and, as Eppstein et al. 
show, such point location queries can be used to answer ap- 
proximate nearest-neighbor queries and approximate range 
searches in d- dimensional space. In the case of defining a 
range-determined link structure from a quadtree or octree, 
the range associated with each node is its associated cube 
and the range associated with a link is that of the child 



node for that link. We can define a set-halving lemma for 
quadtrees and octrees as follows. (See Figure 

Lemma 3. Let S be any set of n points in d-dimensional 
space, for fixed constant d > 2, let T be formed by choos- 
ing each member of S independently with probability 1/2, 
and let C be a (hyper)cube associated with a node in the 
quadtree/octree D{T). Then the expected number of node 
hypercubes m D{S) that conflict with C is 0(1). 

Proof. Follows from results from Eppstein et a/. 0. □ 

This lemma and the skip-web framework implies that we 
can perform point-location queries in a network of n hosts 
with 0(log n) messages (even if the underlying quadtree or 
octree has depth 0{n)). 

3.2 Character Strings: Tries 

We have already discussed above that a compressed trie 
defined by a set 5* of n character strings defined over a fixed 
alphabet is a range-determined link structure. In this sec- 
tion, we show that tries defined over such a set S has a 
set-halving lemma. 



Figure 4: An example trapezoidal map. 



Lemma 4. Let S he any set of n character strings over 
a fixed alphabet, let T be formed by choosing each member 
of S independently with probability 1/2, and let s be a set 
of strings associated with a node or link in D{T). Then the 
expected number of ranges (i.e., nodes and links) in D{S) 
that conflict with s is 0(1). 

Proof. Consider an edge of D{T). Each such edge cor- 
responds to a path P in D{S), which could just be the same 
edge. Note that if P is more than a single edge, then each 
node along P branches to at least one string in S — T. More- 
over, the string(s) for any such node are disjoint from the 
string(s) for any other node missing from D{T) but in D{S). 
Since each string has a probability of 1/2 of being in S — T, 
this implies that the expected number of nodes along P is 
0(1). □ 

Applying this lemma to the skip-web framework implies 
that we can perform trie searches for an arbitrary character 
string using O(logn) messages, even if the underlying trie 
has depth 0(n). The subqueries performed by each host 
in this case involves finding the first place where a query 
substring differs with the string defined by a substring asso- 
ciated with a link in a trie. 

3.3 Trapezoidal Maps 

In this section, we outline a construction of a skip-web 
for trapezoidal maps. A trapezoidal map D{S) for a set S 
of non-crossing line segments in the plane is a subdivision 
of the plane defined by the input segments and additional 
segments formed by extending vertical rays up and down 
from each segment endpoint until it hits another segment 
or extends to infinity. (See Figure 0]) As with our other 
skip-web applications, we begin with a set-splitting lemma. 

Lemma 5. Let S be any set of disjoint line segments, let 
X be any point in the complement of S, let T be formed by 



choosing each member of S independently with probability 
1/2, and let t be the trapezoid containing x in D{T). Then 
the expected number of trapezoids in D{S) that conflict with 
t is 0(1). 



Proof. First note that, for any trapezoid t in D{T), the 
number of conflicts is exactly 1 -f a -I- 26 -I- 3c, where a is 
the number of segments of S that cut all the way across 
t and have no endpoints interior to t, b is the number of 
segments of S with one endpoint interior to t, and c is the 
number of segments of 5* with both endpoints interior to 
t. This equation can be shown by induction on \S — T\: if 
15* — T| — 0, the number of conflicts is 1 {t conflicts with 
itself) and each additional segment added to get from T to 
S increases the number of conflicting trapezoids by one (if 
it cuts t and has no vertex in t), by two (if it has one vertex 
in t), or three (if it has both vertices in t). 

To show that the expected values of a, b, and c are 0{1), 
let us consider bounding a (the methods for b and c are 
similar). Given a trapezoid t of D{T), note that in order for 
t to exist in D{T) none of the segments of 5* that intersect 
the interior of t could have been chosen to be in T. The 
expected value of a for t, therefore, can be bounded by 

n 

iPr(none of i segments cutting t were chosen to be in T). 

i=0 

Since each segment of S has probability 1/2 of being chosen 
to be in T, this bound is at most 

T- 

i=0 ^ 

which is 0(1). □ 



4. UPDATES IN A SKIP- WEB 

Having described how a skip-web structure can be used 
for efficiently performing query routing, we describe in this 

section how to efficiently perform updates in a skip-web. 
For the sake of simplicity, we restrict our attention here to 
the insertion of a new element of the ground sot S, assuming 
that there is only a single update being performed at a time, 
that all updates eventually complete, and that queries are 
temporarily blocked at nodes being updated (until the up- 
date completes). Deletion is similar (and actually simpler). 

Suppose some host h in our poor-to-pcer network wishes to 
insert a new element x of the ground set S. We begin by per- 
forming a search to locate x in the level-0 structure, D{S). 
We then update this structure to construct D{SL){x}). For 
one-dimensional linked lists, quadtrees, octrees, and tries, 
this update involves the insertion of 0(1) nodes and links. 
For a trapezoidal map, it could involve the creation of an 
output-sensitive number of new nodes and links, however, 
so let us assume for this application that we are considering 
the insertion of a new trapezoid (which we would amortize 
against the output-sensitive term in an insertion bound). 
So let us assume that the number of new nodes and links 
created by the insertion is 0(1). 

Having added the new nodes and links to go from D{S) to 
D{SLI{x}), we then generate [logn] random bits, which will 
determine which higher-level structures x should be added 
to. We add x to the structures for these levels in a bottom- 
up fashion, starting from the nodes and links conflicting with 
the 0(1) nodes and links that we replaced when adding x to 

5. Given these conflicts, we can "move up" to the higher- 
level structure, add new nodes and links for the insertion 
of X, and then continue the bottom-up procedure. By our 
assumption, the number of messages needed to update each 
level is 0(1); hence, the expected message complexity is 
O(logn). In fact, in the case of one-dimensional data, the 
expected message complexity is O (log n/ log log n), since we 
only need to send new messages to the structures on ba- 
sic levels (any block splits we need to make as the result 
of an insertion can be amortized against O(logn) previous 
insertions that led to that split). 
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